Sociology 229:  Advanced Regression

 

Assignment #3:  EHA Basics

 

Due:  Start of class, February 2

 

This assignment requires a dataset on the course website entitled “Assignment 3 GSS2006subset.dta” and an accompanying do-file.

 

  1. Download the dataset in STATA
  2. Create your own “do” file that opens the data
  3. My syntax creates some variables and makes some survivor, hazard, and integrated hazard plots.  See if you can get that same syntax to run on your computer without error.  Make your own do-file, don’t just use mine!
    1. Note:  Don’t worry if you don’t understand the “stset” command.  We’ll discuss that later.
    2. Note2:  I’ve created a dummy variable that identifies people born prior to 1960.  (I suspected that their timing of first childbirth might differ from people born recently.)  I was later able to make plots that break out groups based on values of that dummy variable.
    3. I created several other variables that may be useful.
      1. NOTE:  parentsincome’ refers to family income at age 16.
  4. Examine survivor, hazard, and integrated hazard plots for two or more other interesting subgroups in the data.  You can either create your own new dummy variable, or use one that I create (such as gender or having “rich parents”).
  5. Run a basic Cox regression model looking at the effects of gender, race (white as omitted category), parent’s income and mother’s education on the hazard rate of having a first child.
  6. Answer questions below.

 

 

Question 1:  Write a few sentences describing the survivor, hazard, and integrated hazard plots.  What do they tell you about the timing of childbirth in the US?  What is the overall shape?  When is the rate highest?  About what proportion never have a first child?  (4-5 sentences are sufficient, but you can write more if you wish.)

 

Question 2:  How does the timing of childbirth differ for people in the pre-1960 cohort (versus born after 1960)?

 

Question 3:  What categorical variable did you create?  Why did you expect those groups to differ in the timing of childbirth?  What did you observe in your plots?   Was it what you expected?

 

Question 4:  Summarize the findings of the Cox model.  Interpret the coefficient for “dfemale” by exponentiating to determine the impact of gender on the hazard rate of having a first child.

 

Question 5:  Notice that I did not include a measure for the respondent’s income (or education or anything else that changes over time).  That is because the simple data structure of this dataset does not allow for independent variables to change over time.  We know an individual’s income at the time of the survey – which could be long after they actually had their first child.  Why might that cause problems for this analysis?  Or, more specifically, how might the inclusion of such a variable bias the results?

 

Turn in the following:

  1. A hazard plot of first childbirth, broken out by the variable of your choice (Step 4)
  2. Results of the Cox model
  3. Answers to the questions